In [1]:
library(Seurat)
library(ggplot2)
library(dplyr)
options(repr.plot.width=10, repr.plot.height=8)
Loading required package: SeuratObject

Loading required package: sp

‘SeuratObject’ was built under R 4.4.1 but the current version is
4.4.2; it is recomended that you reinstall ‘SeuratObject’ as the ABI
for R may have changed

‘SeuratObject’ was built with package ‘Matrix’ 1.6.5 but the current
version is 1.7.3; it is recomended that you reinstall ‘SeuratObject’ as
the ABI for ‘Matrix’ may have changed


Attaching package: ‘SeuratObject’


The following objects are masked from ‘package:base’:

    intersect, t



Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


In [2]:
in_path <- "~/projects/2024/RA/GSE136035_RAW/"
out_path <- "~/projects/2024/RA/GSE136035_RAW/output/"
In [3]:
f_barcodes <- dir(path = in_path, pattern = "*barcodes.tsv.gz")
f_barcodes
  1. 'GSM4039805_165_barcodes.tsv.gz'
  2. 'GSM4039807_178_barcodes.tsv.gz'
  3. 'GSM4039809_182_barcodes.tsv.gz'
  4. 'GSM4039813_177_barcodes.tsv.gz'
  5. 'GSM4039815_178_T1_barcodes.tsv.gz'
  6. 'GSM4039819_177_T1_barcodes.tsv.gz'
  7. 'GSM4885423_HC2_barcodes.tsv.gz'
  8. 'GSM4885425_HC3_barcodes.tsv.gz'
In [4]:
d_name <- character()
f_features <- character()
f_matrix <- character()
for (i in 1:length(f_barcodes)) {
    d_name[i] <- substr(f_barcodes[i], 1, 10)
    f_features[i] <- paste0(substr(f_barcodes[i], 1, (nchar(f_barcodes[i])-15)),"genes.tsv.gz")
    f_matrix[i] <- paste0(substr(f_barcodes[i], 1, (nchar(f_barcodes[i])-15)),"matrix.mtx.gz")
    if (!file.exists(paste0(in_path,d_name[i],"/barcodes.tsv.gz"))) {
        dir.create(paste0(in_path,d_name[i]))
        file.copy(paste0(in_path,f_barcodes[i]), paste0(in_path,d_name[i],"/barcodes.tsv.gz"))
        file.copy(paste0(in_path,f_features[i]), paste0(in_path,d_name[i],"/features.tsv.gz"))
        file.copy(paste0(in_path,f_matrix[i]), paste0(in_path,d_name[i],"/matrix.mtx.gz"))
    }
}
In [5]:
scdata <- list()
for (i in 1:length(f_barcodes)) {
    data <- Read10X(data.dir = paste0(in_path,d_name[i]))
    scdata[[i]] <- CreateSeuratObject(counts = data, project = d_name[i])
    i
}
scdata
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
10X data contains more than one type and is being returned as a list containing matrices of each type.

Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
10X data contains more than one type and is being returned as a list containing matrices of each type.

Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
[[1]]
An object of class Seurat 
33694 features across 2704 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[2]]
An object of class Seurat 
33694 features across 667 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[3]]
An object of class Seurat 
33694 features across 1862 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[4]]
An object of class Seurat 
33694 features across 2079 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[5]]
An object of class Seurat 
33694 features across 912 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[6]]
An object of class Seurat 
33694 features across 1173 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[7]]
An object of class Seurat 
33542 features across 9830 samples within 1 assay 
Active assay: RNA (33542 features, 0 variable features)
 2 layers present: counts.Gene Expression, counts.Antibody Capture

[[8]]
An object of class Seurat 
33542 features across 8893 samples within 1 assay 
Active assay: RNA (33542 features, 0 variable features)
 2 layers present: counts.Gene Expression, counts.Antibody Capture
In [6]:
GSE136035 <- merge(scdata[[1]], y = scdata[-1], add.cell.ids = d_name)
In [7]:
GSE136035
An object of class Seurat 
45072 features across 28120 samples within 1 assay 
Active assay: RNA (45072 features, 0 variable features)
 10 layers present: counts.GSM4039805, counts.GSM4039807, counts.GSM4039809, counts.GSM4039813, counts.GSM4039815, counts.GSM4039819, counts.Gene Expression.GSM4885423, counts.Antibody Capture.GSM4885423, counts.Gene Expression.GSM4885425, counts.Antibody Capture.GSM4885425
In [8]:
GSE136035[["percent.mt"]] <- PercentageFeatureSet(GSE136035, pattern = "^MT-")
In [9]:
GSE136035
An object of class Seurat 
45072 features across 28120 samples within 1 assay 
Active assay: RNA (45072 features, 0 variable features)
 10 layers present: counts.GSM4039805, counts.GSM4039807, counts.GSM4039809, counts.GSM4039813, counts.GSM4039815, counts.GSM4039819, counts.Gene Expression.GSM4885423, counts.Antibody Capture.GSM4885423, counts.Gene Expression.GSM4885425, counts.Antibody Capture.GSM4885425
In [10]:
VlnPlot(GSE136035, features = c("nFeature_RNA"), pt.size = 0)
VlnPlot(GSE136035, features = c("nCount_RNA"), pt.size = 0)
VlnPlot(GSE136035, features = c("percent.mt"), pt.size = 0)
Warning message:
“Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”
Warning message:
“Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”
No description has been provided for this image
Warning message:
“Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”
No description has been provided for this image
No description has been provided for this image
In [11]:
p1 <- FeatureScatter(GSE136035, feature1 = "nCount_RNA", feature2 = "percent.mt", raster = F)
p2 <- FeatureScatter(GSE136035, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", raster = F)
p1
p2
No description has been provided for this image
No description has been provided for this image
In [12]:
GSE136035 <- subset(GSE136035, subset = nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt <20)
GSE136035 <- JoinLayers(GSE136035)
GSE136035
An object of class Seurat 
45072 features across 27598 samples within 1 assay 
Active assay: RNA (45072 features, 0 variable features)
 1 layer present: counts
In [13]:
GSE136035[["RNA"]] <- split(GSE136035[["RNA"]], f = GSE136035$orig.ident)
In [14]:
GSE136035
An object of class Seurat 
45072 features across 27598 samples within 1 assay 
Active assay: RNA (45072 features, 0 variable features)
 8 layers present: counts.GSM4039805, counts.GSM4039807, counts.GSM4039809, counts.GSM4039813, counts.GSM4039815, counts.GSM4039819, counts.GSM4885423, counts.GSM4885425
In [15]:
GSE136035 <- JoinLayers(GSE136035)
In [16]:
GSE136035 <- NormalizeData(GSE136035, normalization.method = "LogNormalize", scale.factor = 10000)
GSE136035 <- FindVariableFeatures(GSE136035, selection.method = "vst", nfatures = 2000)
GSE136035 <- ScaleData(GSE136035, features = rownames(GSE136035))
GSE136035 <- RunPCA(GSE136035)
Normalizing layer: counts

Finding variable features for layer counts

Centering and scaling data matrix

PC_ 1 
Positive:  FCN1, SERPINA1, LYZ, S100A9, S100A8, CFD, CEBPD, CSTA, CLEC7A, APOBEC3A 
	   CD68, CLEC12A, LGALS2, FPR1, LILRA2, PILRA, AIF1, MS4A6A, LILRB2, LRP1 
	   LST1, S100A12, CSF3R, CD36, MNDA, LILRB3, TNFSF13B, C5AR1, TYMP, CD300E 
Negative:  ATP5MC3, ATP5F1A, ATP5PF, ATP5MF, SELENOK, IL7R, IGHM, PASK, ATP5MC1, BUD23 
	   TENT5C, CD27, ELOC, AL139020.1, CENPX, ARL6IP1, IGLL5, LINC01781, MESD, IGHV5-78 
	   LINC01857, TRAT1, IGHV1-69D, JPT1, GATD3B, LIME1, EAF2, AC078883.1, SRP19, JCHAIN 
PC_ 2 
Positive:  LYZ, S100A8, FCN1, S100A9, AIF1, CSTA, SERPINA1, S100A12, FTH1, FPR1 
	   CLEC7A, LST1, LGALS2, CSF3R, LRP1, CLEC12A, MS4A6A, CD36, APOBEC3A, CFD 
	   CFP, RNF130, LILRA2, APLP2, C5AR1, TNFSF13B, CD163, PILRA, LILRB3, MPEG1 
Negative:  TNFRSF17, MZB1, HRASLS2, DERL3, FKBP11, JCHAIN, SDF2L1, PPIB, TYMS, HSP90B1 
	   PDIA4, SSR3, MANF, GPRC5D, MYDGF, CAV1, SUB1, CHPF, BIRC5, ITM2C 
	   ABCB9, PDIA6, PRDX4, SEC61B, RRM2, SPCS3, HSPA5, ELL2, BIK, UBE2J1 
PC_ 3 
Positive:  MTRNR2L12, AC090498.1, ATP5E, ATP5L, ATP5D, ATP5B, ATPIF1, ATP5I, ATP5G3, UQCRHL 
	   ATP5J2, USMG5, ATP5A1, C14orf2, ATP5F1, MTRNR2L8, ATP5C1, CD74, ATP5G1, SELT 
	   ATP5H, HLA-DQA2, RP5-1171I10.5, SELK, KIAA0226L, RP11-731F5.2, RPS4Y1, IGHM, FCRL3, TMEM2 
Negative:  ATP5MC3, ATP5F1A, ATP5MF, ATP5PF, IL7R, SELENOK, S100A4, SRGN, ATP5MC1, CRIP1 
	   PASK, CD27, S100A10, JPT1, LIME1, RAB5IF, ELOC, MESD, BUD23, NUCB2 
	   AIF1, GZMM, CENPX, CYTOR, ZYX, CFP, FCN1, TRAT1, S100A6, ARL6IP1 
PC_ 4 
Positive:  CD74, IGHM, JCHAIN, EAF2, IGLL5, MZB1, IGKC, CYBB, SPI1, POU2AF1 
	   ITM2C, RHOB, DERL3, UBE2J1, TNFRSF17, IGHV5-78, HERPUD1, IGLC2, PLD4, MPEG1 
	   PPP1R14A, SEMA4A, LILRB4, CPNE5, APLP2, LINC01781, TENT5C, KLHL14, IGHA1, IGHA2 
Negative:  GNLY, PRF1, GZMA, KLRD1, GZMB, CTSW, CCL5, FGFBP2, GZMH, CLIC3 
	   KLRF1, GZMM, CCL4, IL2RB, ADGRG1, SH2D1B, S1PR5, SAMD3, IGFBP7, S100A4 
	   IL7R, FCGR3A, CD160, PTGDR, TTC38, TBX21, C1orf21, DOK2, ID2, FCRL6 
PC_ 5 
Positive:  GZMB, GNLY, KLRD1, FGFBP2, KLRF1, CD74, PRF1, GZMH, IGFBP7, CLIC3 
	   CCL4, SH2D1B, FCGR3A, GZMA, ADGRG1, S1PR5, TTC38, CD160, IGHM, HOPX 
	   CCL5, KLRC1, XCL2, FCRL6, PTGDR, EAF2, C1orf21, IL2RB, JCHAIN, RHOC 
Negative:  IL7R, TRAT1, CISH, NPDC1, CD27, S100A11, S100A4, FYB, LIME1, S100A10 
	   S100A6, PASK, GATA3, SLC40A1, TIMP1, CITED2, ZYX, GAPDH, ATP5H, ITGB1 
	   LDHA, ATP5L, USMG5, PCSK1N, IFI6, PTGER2, TXN, CRIP2, CRIP1, ACTG1 

In [17]:
ElbowPlot(GSE136035)
No description has been provided for this image
In [18]:
GSE136035 <- FindNeighbors(GSE136035, dims = 1:15, reduction = "pca")
GSE136035 <- FindClusters(GSE136035, resolution = 0.5, cluster.name = "unintegrated_clusters")
Computing nearest neighbor graph

Computing SNN

Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 27598
Number of edges: 894220

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9305
Number of communities: 20
Elapsed time: 5 seconds
In [19]:
GSE136035 <- RunUMAP(GSE136035, dims = 1:15, reduction = "pca", reduction.name = "umap.unintegrated")
Warning message:
“The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session”
12:17:38 UMAP embedding parameters a = 0.9922 b = 1.112

12:17:38 Read 27598 rows and found 15 numeric columns

12:17:38 Using Annoy for neighbor search, n_neighbors = 30

12:17:38 Building Annoy index with metric = cosine, n_trees = 50

0%   10   20   30   40   50   60   70   80   90   100%

[----|----|----|----|----|----|----|----|----|----|

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
|

12:17:41 Writing NN index file to temp file /tmp/RtmpRGOn89/file32c80380fb5a4

12:17:41 Searching Annoy index using 1 thread, search_k = 3000

12:17:51 Annoy recall = 100%

12:17:52 Commencing smooth kNN distance calibration using 1 thread
 with target n_neighbors = 30

12:17:54 Initializing from normalized Laplacian + noise (using RSpectra)

12:17:55 Commencing optimization for 200 epochs, with 1202276 positive edges

12:17:55 Using rng type: pcg

12:18:13 Optimization finished

In [20]:
GSE136035
Idents(GSE136035) <- GSE136035$orig.ident

DimPlot(GSE136035, reduction = "umap.unintegrated")
An object of class Seurat 
45072 features across 27598 samples within 1 assay 
Active assay: RNA (45072 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap.unintegrated
No description has been provided for this image
In [21]:
#GSE136035 <- IntegrateLayers(
#  object = GSE136035, method = HarmonyIntegration,
#  orig.reduction = "pca", new.reduction = "harmony",
#  verbose = FALSE
#)
In [22]:
library(harmony)
GSE136035 <- RunHarmony(GSE136035,"orig.ident", plot_convergence = T)
Loading required package: Rcpp

Transposing data matrix

Initializing state using k-means centroids initialization

Harmony 1/10

Harmony 2/10

Harmony 3/10

Harmony converged after 3 iterations

No description has been provided for this image
In [23]:
harmony_embeddings <- Embeddings(GSE136035, "harmony")
harmony_embeddings[1:5, 1:5]
A matrix: 5 × 5 of type dbl
harmony_1harmony_2harmony_3harmony_4harmony_5
GSM4039805_AAACCTGAGGCATGGT-1 1.5389411-1.676452236.9010755 0.2210354-1.509808278
GSM4039805_AAACCTGCACATGTGT-1-0.3147648 0.081354630.6775997-3.6687835-5.617223173
GSM4039805_AAACCTGCAGTTCATG-1 3.0815150-2.307293877.1943725-2.3830826 1.419970846
GSM4039805_AAACCTGGTATCAGTC-1 0.5034509 0.122213616.1371670 1.0243512 0.106980311
GSM4039805_AAACCTGGTGCCTGGT-1 2.5056201-1.096205637.6866434 1.6971614-0.002098452
In [24]:
DimPlot(GSE136035, reduction = "harmony", group.by = "orig.ident",raster = F)
No description has been provided for this image
In [25]:
GSE136035 <- RunUMAP(GSE136035, dims = 1:15, reduction = "harmony")
12:18:45 UMAP embedding parameters a = 0.9922 b = 1.112

12:18:45 Read 27598 rows and found 15 numeric columns

12:18:45 Using Annoy for neighbor search, n_neighbors = 30

12:18:45 Building Annoy index with metric = cosine, n_trees = 50

0%   10   20   30   40   50   60   70   80   90   100%

[----|----|----|----|----|----|----|----|----|----|

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
|

12:18:49 Writing NN index file to temp file /tmp/RtmpRGOn89/file32c8032d7ce10c

12:18:49 Searching Annoy index using 1 thread, search_k = 3000

12:18:59 Annoy recall = 100%

12:19:00 Commencing smooth kNN distance calibration using 1 thread
 with target n_neighbors = 30

12:19:01 Initializing from normalized Laplacian + noise (using RSpectra)

12:19:03 Commencing optimization for 200 epochs, with 1212172 positive edges

12:19:03 Using rng type: pcg

12:19:20 Optimization finished

In [26]:
GSE136035 <- FindNeighbors(GSE136035, reduction = "harmony", dims = 1:15)
GSE136035 <- FindClusters(GSE136035, resolution = 0.5, cluster.name = "harmony_clusters")
Computing nearest neighbor graph

Computing SNN

Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 27598
Number of edges: 885726

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9142
Number of communities: 17
Elapsed time: 5 seconds
In [27]:
DimPlot(GSE136035, raster = F, reduction = "umap")
DimPlot(GSE136035, raster = F, reduction = "umap", group.by = "orig.ident")
pdf(paste0(out_path, "UMAP_harmony.pdf"), width=8, height=6)
DimPlot(GSE136035, raster = F, reduction = "umap")
DimPlot(GSE136035, raster = F, reduction = "umap", group.by = "orig.ident")
dev.off()
No description has been provided for this image
pdf: 2
No description has been provided for this image
In [28]:
ABC_marker_1 <- c("CD19", "CD80", "CD86", "FAS", "FCRLA", "FCRLB", "FGL2", "CXCL10", "CXCR3", "NKG7", "ITGAM", "ITGB2", "TBX21", "ZBTB32", "FCER2", "CR2")
length(ABC_marker_1)
is.element(ABC_marker_1,rownames(GSE136035))
ABC_marker_1 <- ABC_marker_1[is.element(ABC_marker_1,rownames(GSE136035))]
16
  1. TRUE
  2. TRUE
  3. TRUE
  4. TRUE
  5. TRUE
  6. TRUE
  7. TRUE
  8. TRUE
  9. TRUE
  10. TRUE
  11. TRUE
  12. TRUE
  13. TRUE
  14. TRUE
  15. TRUE
  16. TRUE
In [29]:
FeaturePlot(GSE136035, features = ABC_marker_1[1:4], ncol = 2, reduction = "umap")
No description has been provided for this image
In [30]:
fig1_1 <- FeaturePlot(GSE136035, features = ABC_marker_1[1:9], ncol = 3, reduction = "umap")
fig1_2 <- FeaturePlot(GSE136035, features = ABC_marker_1[10:16], ncol = 3, reduction = "umap")
pdf(paste0(out_path, "ABC_marker_feature.pdf"), width=12, height=10)
fig1_1
fig1_2
dev.off()
pdf: 2
In [31]:
source("~/projects/MK_Rcode/setZ.R")
GSE136035[["ABC_score"]] <- setZ(ABC_marker_1,rownames(GSE136035@assays$RNA$scale.data),GSE136035@assays$RNA$scale.data)
In [32]:
LSP1_gene <- c("CEBPA", "CEBPB", "IL1B", "CCL3", "TNF", "CCL4", "CCL5", "SPP1", "CD7", "MSR1", "ITGAM", "KIT", "CD14", "CSF3R", "IL1R2", "CSF1R", "ITGA1", "ITGA5", "ITGB5", "ITGB1", "CAMP", "THBS1", "C3", "MEFV", "FOS", "CYBB", "CARD9", "IRF7", "OAS3", "NLRP3", "CTSB", "IKBKE", "MPO")
length(LSP1_gene)
LSP1_gene_sub <- LSP1_gene[is.element(LSP1_gene,rownames(GSE136035))]
33
In [33]:
LSP1_gene_sub
  1. 'CEBPA'
  2. 'CEBPB'
  3. 'IL1B'
  4. 'CCL3'
  5. 'TNF'
  6. 'CCL4'
  7. 'CCL5'
  8. 'SPP1'
  9. 'CD7'
  10. 'MSR1'
  11. 'ITGAM'
  12. 'KIT'
  13. 'CD14'
  14. 'CSF3R'
  15. 'IL1R2'
  16. 'CSF1R'
  17. 'ITGA1'
  18. 'ITGA5'
  19. 'ITGB5'
  20. 'ITGB1'
  21. 'CAMP'
  22. 'THBS1'
  23. 'C3'
  24. 'MEFV'
  25. 'FOS'
  26. 'CYBB'
  27. 'CARD9'
  28. 'IRF7'
  29. 'OAS3'
  30. 'NLRP3'
  31. 'CTSB'
  32. 'IKBKE'
  33. 'MPO'
In [34]:
fig2_1 <- FeaturePlot(GSE136035, features = LSP1_gene_sub[1:9], ncol = 3, reduction = "umap")
fig2_2 <- FeaturePlot(GSE136035, features = LSP1_gene_sub[10:18], ncol = 3, reduction = "umap")
fig2_3 <- FeaturePlot(GSE136035, features = LSP1_gene_sub[19:27], ncol = 3, reduction = "umap")
fig2_4 <- FeaturePlot(GSE136035, features = LSP1_gene_sub[28:33], ncol = 3, reduction = "umap")

pdf(paste0(out_path, "LSP1_marker_feature.pdf"), width=12, height=10)
fig2_1
fig2_2
fig2_3
fig2_4
dev.off()
pdf: 2
In [35]:
GSE136035[["LSP1_score"]] <- setZ(LSP1_gene_sub,rownames(GSE136035@assays$RNA$scale.data),GSE136035@assays$RNA$scale.data)
In [36]:
head(GSE136035$LSP1_score)
GSM4039805_AAACCTGAGGCATGGT-1
-0.122413421523912
GSM4039805_AAACCTGCACATGTGT-1
-0.267874338036728
GSM4039805_AAACCTGCAGTTCATG-1
-1.44750080367965
GSM4039805_AAACCTGGTATCAGTC-1
-1.39620127022624
GSM4039805_AAACCTGGTGCCTGGT-1
-0.121238624479262
GSM4039805_AAACCTGTCTATCGCC-1
0.3778473427849
In [37]:
FeaturePlot(GSE136035, features = "LSP1_score", reduction = "umap") + 
  scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE136035, features = "LSP1_score", reduction = "umap")
GSE136035$LSP1_score_pos <- GSE136035$LSP1_score > 0
DimPlot(GSE136035, group.by = "LSP1_score_pos", reduction = "umap")
FeaturePlot(GSE136035, features = "ABC_score") + 
  scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE136035, features = "ABC_score", reduction = "umap")
GSE136035$ABC_score_pos <- GSE136035$ABC_score > 0
DimPlot(GSE136035, group.by = "ABC_score_pos", reduction = "umap")

pdf(paste0(out_path, "score_feature.pdf"), width=8, height=6)
FeaturePlot(GSE136035, features = "LSP1_score", reduction = "umap") + 
  scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE136035, features = "LSP1_score", reduction = "umap")
DimPlot(GSE136035, group.by = "LSP1_score_pos", reduction = "umap")
FeaturePlot(GSE136035, features = "ABC_score", reduction = "umap") + 
  scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE136035, features = "ABC_score", reduction = "umap")
DimPlot(GSE136035, group.by = "ABC_score_pos", reduction = "umap")
dev.off()
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
No description has been provided for this image
No description has been provided for this image
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
pdf: 2
No description has been provided for this image
In [38]:
B_marker <- c("CD19", "CD79A", "CD79B", "MS4A1", "CD27", "CD24", "CD38", "IGHD")
T_marker <- c("CD4", "CD8A", "CD69", "CD70", "CD80", "CD40LG", "PDCD1", "HAVCR2")
FeaturePlot(GSE136035, features = B_marker, ncol = 3)
FeaturePlot(GSE136035, features = T_marker, ncol = 3)

pdf(paste0(out_path, "marker_expression.pdf"), width=12, height=12)
FeaturePlot(GSE136035, features = B_marker, ncol = 3, reduction = "umap")
FeaturePlot(GSE136035, features = T_marker, ncol = 3, reduction = "umap")
dev.off()
No description has been provided for this image
pdf: 2
No description has been provided for this image
In [39]:
LSP1_pos_ind <- GSE136035$LSP1_score_pos==TRUE & GSE136035$ABC_score_pos==FALSE
ABC_pos_ind <- GSE136035$LSP1_score_pos==FALSE & GSE136035$ABC_score_pos==TRUE
DP_ind <- GSE136035$LSP1_score_pos==TRUE & GSE136035$ABC_score_pos==TRUE
DN_ind <- GSE136035$LSP1_score_pos==FALSE & GSE136035$ABC_score_pos==FALSE
sum(LSP1_pos_ind)
sum(ABC_pos_ind)
sum(DP_ind)
sum(DN_ind)
4169
7077
3945
12407
In [40]:
Idents(GSE136035, cells = colnames(GSE136035)[LSP1_pos_ind]) <- "LSP1_positive"
Idents(GSE136035, cells = colnames(GSE136035)[ABC_pos_ind]) <- "ABC_positive"
Idents(GSE136035, cells = colnames(GSE136035)[DP_ind]) <- "Double_positive"
Idents(GSE136035, cells = colnames(GSE136035)[DN_ind]) <- "Double_negative"
DimPlot(GSE136035, reduction = "umap")
pdf(paste0(out_path,"umap_ABC_LSP1.pdf"), width = 8, height = 6)
DimPlot(GSE136035, reduction = "umap")
dev.off()
pdf: 2
No description has been provided for this image
In [41]:
if (!file.exists(paste0(out_path,"allmarkers.rds"))) {
    markers <- FindAllMarkers(GSE136035, min.pct = 0.1, test.use = 'LR', logfc.threshold = 0.25)
}
if (!exists("markers")) {
    markers <- readRDS(paste0(out_path,"allmarkers.rds"))
}
Calculating cluster Double_negative

Calculating cluster Double_positive

Calculating cluster ABC_positive

Calculating cluster LSP1_positive

In [42]:
GSE136035
An object of class Seurat 
45072 features across 27598 samples within 1 assay 
Active assay: RNA (45072 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 4 dimensional reductions calculated: pca, umap.unintegrated, harmony, umap
In [43]:
markers %>%
  group_by(cluster) %>%
  filter(avg_log2FC > 0) %>%
  filter(p_val_adj < 0.05) %>%
  arrange(desc(avg_log2FC)) %>%
  slice_head(n = 5) %>%
  ungroup() -> top5
# DoHeatmap(pbmc, features = top5$gene) + NoLegend()
DotPlot(GSE136035, features = top5$gene[!duplicated(top5$gene)] ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  scale_color_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0)
pdf(paste0(out_path, "Dotplot.pdf"), width = 8, height = 6)
DotPlot(GSE136035, features = top5$gene[!duplicated(top5$gene)] ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  scale_color_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0)
dev.off()
Warning message:
“Scaling data with a low number of groups may produce misleading results”
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Warning message:
“Scaling data with a low number of groups may produce misleading results”
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
pdf: 2
No description has been provided for this image
In [44]:
unique(Idents(GSE136035))
  1. ABC_positive
  2. LSP1_positive
  3. Double_positive
  4. Double_negative
Levels:
  1. 'Double_negative'
  2. 'Double_positive'
  3. 'ABC_positive'
  4. 'LSP1_positive'
In [45]:
LSP1vsABC.marker <- FindMarkers(GSE136035, min.pct = 0.1, test.use = 'LR', logfc.threshold = 0.1, ident.1 = "LSP1_positive", ident.2 = "ABC_positive")
source("~/projects/MK_Rcode/volcano_scdata.R")
In [46]:
vol1 <- volcano_scdata(LSP1vsABC.marker, "LSP1 vs ABC")
pdf(paste0(out_path,"LSP1_vs_ABC_volcano.pdf"), width = 6, height = 6)
vol1
dev.off()
pdf: 2
No description has been provided for this image
In [47]:
library(clusterProfiler)
library(enrichplot)
library(msigdbr)
library(stringr)
set.seed(2025)

clusterProfiler v4.14.6 Learn more at https://yulab-smu.top/contribution-knowledge-mining/

Please cite:

T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan,
X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal
enrichment tool for interpreting omics data. The Innovation. 2021,
2(3):100141


Attaching package: ‘clusterProfiler’


The following object is masked from ‘package:stats’:

    filter


enrichplot v1.26.6 Learn more at https://yulab-smu.top/contribution-knowledge-mining/

Please cite:

Guangchuang Yu. Using meshes for MeSH term enrichment and semantic
analyses. Bioinformatics. 2018, 34(21):3766-3767,
doi:10.1093/bioinformatics/bty410

For full functionality, please install the 'msigdbdf' package with:
install.packages('msigdbdf', repos = 'https://igordot.r-universe.dev')

In [48]:
hallmark_geneset <- msigdbr(species = "Homo sapiens",  category = "H")
hallmark_geneset$gs_name_short <- str_extract(hallmark_geneset$gs_name, "(?<=HALLMARK_).*$")
sort_glist <- LSP1vsABC.marker$avg_log2FC
names(sort_glist) <- rownames(LSP1vsABC.marker)
sort_glist <- sort(sort_glist, decreasing = T)
gsea_result.LSP1vsABS <- GSEA(geneList = sort_glist, TERM2GENE = select(hallmark_geneset, gs_name_short, gene_symbol), seed = T)
dotplot(gsea_result.LSP1vsABS, showCategory=10, split=".sign") + facet_grid(.~.sign)
gsea_result.LSP1vsABS <- GSEA(geneList = sort_glist, TERM2GENE = select(hallmark_geneset, gs_name_short, gene_symbol), seed = T, pvalueCutoff = 0.99)
if (!file.exists(paste0(out_path,"gsea_GSE136035_050725.rds"))) {
    saveRDS(gsea_result.LSP1vsABS, paste0(out_path,"gsea_GSE136035_050725.rds"))
}
Warning message:
“The `category` argument of `msigdbr()` is deprecated as of msigdbr 10.0.0.
ℹ Please use the `collection` argument instead.”
The 'msigdbdf' package must be installed to access the full dataset.

Please run the following command to install the 'msigdbdf' package:
install.packages('msigdbdf', repos = 'https://igordot.r-universe.dev')

using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).


preparing geneSet collections...

GSEA analysis...

leading edge analysis...

done...

using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).


preparing geneSet collections...

GSEA analysis...

leading edge analysis...

done...

No description has been provided for this image
In [49]:
pdf(paste0(out_path, "GSEA_LSP1vsABS.pdf"), width = 8, height = 6)
dotplot(gsea_result.LSP1vsABS, showCategory=10, split=".sign") + facet_grid(.~.sign)
dev.off()
pdf: 2
In [50]:
if (!file.exists(paste0(out_path,"GSE136035_metadata.rds"))) {
    saveRDS(GSE136035@meta.data, file = paste0(out_path,"GSE136035_metadata.rds"))
}
In [55]:
if (!file.exists(paste0(out_path,"GSE136035_Sobj_050825.rds"))) {
    saveRDS(GSE136035, file = paste0(out_path,"GSE136035_Sobj_050825.rds"))
}
In [53]:
out_path
'~/projects/2024/RA/GSE136035_RAW/output/'